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Abstract 

The Radon transform and its adjoint, the back-projection operator, can both be 
expressed as convolutions in log-polar coordinates. Hence, fast algorithms for the 
application of the operators can be constructed by using FFT, if data is resampled at 
log-polar coordinates. Radon data is typically measured on an equally spaced grid 
in polar coordinates, and reconstructions are represented (as images) in Cartesian 
coordinates. Therefore, in addition to FFT, several steps of interpolation have to be 
conducted in order to apply the Radon transform and the back-projection operator 
by means of convolutions. However, in comparison to the interpolation conducted in 
Fourier-based gridding methods, the interpolation performed in the Radon and im¬ 
age domains will typically deal with functions that are substantially less oscillatory. 
Reasonable reconstruction results can thus be expected using interpolation schemes 
of moderate order. It also provides better control over the artifacts that can appear 
due to measurement errors. 

Both the interpolation and the FFT operations can be efficiently implemented 
on Graphical Processor Units (GPUs). For the interpolation, it is possible to make 
use of the fact that linear interpolation is hard-wired on GPUs, meaning that it has 
the same computational cost as direct memory access. Gubic order interpolation 
schemes can be constructed by combining linear interpolation steps which provides 
important computation speedup. 

We provide details about how the Radon transform and the back-projection can 
be implemented efficiently as convolution operators on GPUs. For large data sizes, 
speedups of about 10 times are obtained in relation to the computational times of 
other software packages based on GPU implementations of the Radon transform 
and the back-projection operator. Moreover, speedups of more than a 1000 times 
are obtained against the GPU-implementations provided in the MATLAB image 
processing toolbox. 
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Figure 1: Scheme of computing projections for a given angle 9. 

1 Introduction 

The two-dimensional Radon transform is the mapping of functions to their line integrals, 
i.e., a mapping 7^ : —)• x M where denotes the unit circle, defined by 

TZf{6,s) = j f{x)6{x-9 — s)dx. (1) 

The parameter 9 represents the (normal) direction of the lines, and the parameter s 
denotes the (signed) distance of the line to the origin. It is customary to use 9 both as a 
point on the unit sphere and as an angle, i.e., the notation x ■ 9 is used to parameterize 
lines in by the relation x ■ 9 = xi cos{9) + X 2 sin(0). Note that each line is defined 
twice in this definition, since s can take both positive and negative, and since 9 £ S^. 

A schematic illustration of the Radon transform is depicted in Figurej^ where beams 
are propagating through an object and after absorption are measured by receivers. The 
Radon transform appears for instance in computational tomography (CT). The tomo¬ 
graphic inversion problem lies in recovering an unknown function / given knowledge of 
TZf. For more details about CT, see O ESI [23125]. 

One of the most popular methods of inverting the usual Radon transform is by means 
of the filtered back-projection (FBP) method [25|. It uses the inversion formula 

/ = n*WTZf. ( 2 ) 

where W is a convolution operator acting only on the s-variable, and where 7^^ : x 

M —)■ 72^ denotes the back-projection operator, which integrates over all lines through a 
point, i.e., 

n*g{x) = [ g{9,x- 9). 

Js^ 
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The back-projection operator is adjoint to the Radon transform. The convolution opera¬ 
tor W can either be describes as a Hilbert transform followed by a derivation, both with 
respect to the variable s; or as a convolution operator with a transfer function being a 
suitable scaled version of |(t|, where a denotes the conjugate variable of s. 

The filtered back-projection algorithm has a time complexity of , if we assume 
that reconstructions are made on an iV x lattice and that the numbers of samples in 
s and 0 are both 0{N). This is because for each point x, integration has to be made 
over all lines {N directions) passing through that point. However, there are methods for 
fast O(A^^logA^) back-projection, see for instance [5l fTOl fTH] . 

Another class of fast methods inversion of Radon data goes via the Fourier-slice 
theorem. This is a result by which the Radon data can be mapped to a polar sampling 
of the unknown function in the frequency domain. To recover the unknown function, 
interpolation-like operations (for instances the ones presented in [TJ El El]) have to be 
applied in the frequency domain. The data in the frequency domain will typically be 
oscillatory, as seen in Figure If) and (at in increased resolution) in the lower right 
panel of Figure]^). Hence, in order to interpolate data of this type high interpolation 
order is required. In comparison, the data in the Radon domain will not be particularly 
oscillatory. This is illustrated in Figure]^) and the upper left panel of Figure]^). 
This fact implies that interpolation methods of moderate order can be expected to 
produce reasonable results. This means in turn that less time can be spent on conducting 
interpolation. It also gives more control over the interpolation errors, as local errors will 
be kept local in the Radon domain (and hence more easily distinguishable as artifacts 
in the reconstruction). 

In this paper, we discuss how to design fast algorithms for the application of the 
Radon transform and the back-projection operator by using the fact that they can be 
expressed in terms of convolutions when represented in log-polar coordinates laiiaiini 
[ 39 ] . In particular, we follow the approach suggested in [3|. This formulation turns out 
to be particularly well-suited for implementation on GPUs. A major advantage with 
using GPUs is that the routines for linear interpolation are fast. In fact, the cost for 
computing linear interpolation is the same as reading directly from memory m- This 
feature can be utilized for constructing fast interpolation schemes. In particular, in this 
paper we will work with cubic interpolation on GPU |32[ [37] . 

For the sake of comparison, we will provide performance and accuracy tests of the pro¬ 
posed method along with comparisons against other software packages for tomographic 
computations. We also conduct a performance comparison between the different meth¬ 
ods when utilized in iterative reconstruction techniques. The iterative methods rely on 
applying the Radon transform and the back-projection operator several times. An ad¬ 
vantage of keeping all computations on the GPU is that the needed time for GPU-GPU 
memory transfer can be reduced. 
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(a) (b) (c) (d) 


Figure 2: The (modified) Shepp-Logan phantom (a) and its Radon transform (b). The 
panel (d) shows the one-dimensional Fourier transform of the Radon transform with 
respect to s. The regions indicated by the black squares in (b) and (d) are shown in 
higher resolution in (c). 


2 The Radon transform and the back-projection expressed 
as convolutions 


We recapitulate some of the main ideas of the method described in [3] . A key part there 
is the usage of log-polar coordinates, i.e., 

{ xi = eP cos(0), 

X 2 = eP sin(0), 


where —vr < 6 < tt. To simplify the presentation, we identify f{0,p) with /(xi,X 2 ) 
if {0,p) in the log-polar coordinate system corresponds to the point x = (xi,X 2 ) the 
Cartesian coordinate system, and similarly for other coordinate transformations. 

By representing the distance between lines and the origin s = e^, and by a change of 
variables in 0 from Cartesian to log-polar coordinates the log-polar Radon transform 
can be expressed as 


nipf{e,p) = 


f{e', p')eP'6 fcos(0 - e') - eP-p') dp'dB' 


’ —77 j —OO 

P77 roo 


/ OO 

f{e>,p')ep'c{9-e',p-p') dp'de'. 

-OO 


(3) 


' —77 o —OO 


In particular, TZf{0, s) = 7^ip/(0, log(s)) for s > 0 (which is sufficient since the informa¬ 
tion for s < 0 is redundant). 

We briefly mention how to put this formula in a theoretical framework. Set S = 
(—TT, vr) X M and note that, for a compactly supported smooth function h on S, we have 


k{e.p)ae,p)dpM^ r'" 


• —77 o —OO 


' —7r/2 


cos Q 


which can be written as hdp where p, is an infinite measure on S. Hence, the formula 
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Figure 3: (a) Tangent lines to the circle to determine the support of the log-polar Radon 
transform function; (b) three angle spans to compute partial Radon transforms. 


extends by continuity to, e.g., all continuous compactly supported functions h in S. R 
follows that Q is well-defined whenever / is a continuous function which is zero in a 
neighborhood of 0 . 

As the Radon transform in the coordinate system [6, p) is essentially a convolution 
between / and the distribution C{6,p) = 6{cos{6) — e^), it can be rapidly computed by 
means of Fourier transforms. Special care has to be taken to the distribution an issue 
we will return to in what follows. Ignoring possible difficulties with the distribution (^, 
let us discuss how Q can be realized by using fast Fourier transforms. It is natural to 
assume that the function / has compact support (the object that is measured has to fit 
in the device that is measuring it). The compact support also implies that the Radon 
transform of / will have compact support in the s-variable. However, this is not true in 
the log-polar setting, since p —)■ —oo as s —)■ 0 . 

Note also that for any point x in the plane there is a direction for which there is a line 
passing through x and the origin. This implies that it is not possible to approximate the 
values of the Radon transform by using a finite convolution in log-polar coordinates if it 
is to be computed for all possible line directions. However, by restricting the values of 
9, and by making a translation so that the support of / is moved away from the origin, 
it is in fact possible to describe the partial Radon transform as a finite convolution, and 
then recover the full Radon transform by adding the contributions from various partial 
Radon transforms. The setup is illustrated in Fig. 

More precisely, let /3 be a fixed angle and let an, Or, Ti, T 2 and L 3 be as in Figure 
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[^). From the geometry we see that 


Or = 


1 + sin ( I 


and Or = 


cos 7T — sm w 


1 + sin ( I 


^ 0 

2 ’ 2 


then the 


Assume for the moment that / has support in the gray circle indicated in Figure 
In log-polar coordinates {O^p) it then has support inside of [log(l — 2aij),0] x 

If we restrict our attention to values of TZ\p{f) in the same sector 0 G “f > f : 
only nonzero values of 7?.ip(/) will be for p in the interval [logOrjO]. This means that 
^ip(/)(^)P) for these values can be computed by the finite convolution 

pp/2 pO 


f{e\p')ep\{e-e',p-p') de'dp\ 


(4) 


where {0,p) G 
convolution 


-/3/2 ^log(l-2afl) 

X [log Or , 0] . We now replace the integral Q by the periodic 


2 ’ 2 


P) = f f /(^'> p'V'Cp>er {0 -0',p- p') de'dp', 

J—8 Jlogiiar) 


l-P J\og{ar) 


(5) 


where Cper is the periodic extension of C defined on [log(ar),0] x [—/3,/3]. It is readily 


verihed that for {9, p) G 


P P 

2 ’ 2 


X [log Or, 0], it thus holds that 


nlf{e,p)=nf{e,ep). 

We refer to TZ\^ as the partial log-polar Radon transform. 

Note that, in analogy with the argument following Q, the formula can be written 
as a convolution between f{9',p')e^ and a hnite measure, whereas ([^ can be written 
as a convolution with a locally finite periodic measure. The above formulas are thus 
well defined as long as / is continuous (or even piecewise continuous) in the log-polar 
coordinates. We refer to [l3l Chapter 11], for basic results about convolution between 
functions and periodic distributions. 

The convolution setup for the Radon transform is depicted in Figure]^). The right¬ 
most black solid curve C shows p = log(cos(0)), (—/3 < 9 < f3), which is the support of 
( in X [log0^,0]. Let D denote the circle 

D = {(xi, X2) : (xi - 1 -k aaf + xl < a\} 


in log-polar coordinates. The black dots shows the perimeter of D, and the gray curves 
indicate translation of the curve C associated with the black dots on the circle, within 
the interval [—/l,/3]. There is a difference in grayscale for the points with 9 inside the 
range [—f, f], as only these values are of interest to us. Note that the smallest p-value 
of the contributing part in this interval is log(ar). Moreover, the red lines indicate 
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Figure 4: The effect on the support due to convolutions: a) Radon transform; b) 
Back-projection. 


parts of the translations of C outside [log(o,.),0] x [—/?,/3], and the blue lines shows 
how these curves are wrapped back into the domain [log(ar), 0] x [—/3, j3] by the periodic 
extension of C,. We see that these alias effects do not have any influence on the domain 
[log(a,.),0] X [-f, f]. 

We now describe how can be used to recover TZf for a function / with support 
in the unit circle. We split the angular variable into M different parts, 13 = jj. For 
m = 0,1,... M — 1 let Trn : —?• denote the change of coordinates 

. , _ ( cos(m/3) sin(m/3)\ /xA A-aR\ 

^\—sm{m/3) cos (m/5) y \X 2 j V / 

and let Tm/ = Note that 

TZf {9, s) = af^^TZ (Tmf) (o - ml3, aRS + (1 - ur) cos(6' - m/5)^. (6) 

The Radon transform can thus be computed for arbitrary 9 and 0 < s < 1, by using the 
relation 

T^f{0, s) = {Tmf) (9 - m/5, log {urs + (1 - ur) cos {9 - m/S)) ^ , (7) 

where m = [9/(3{modM)] and [x] denotes the rounding operator to the closest integer 
to X, and where mod denotes the modulus operator. 

We denote the change of coordinates above by 

Sm{9, s) = (^9 - m/5, log {urs + (1 - o/j) 003(0 - m/5)) ^, 

and we then have that 


TZf (9, s) = afi^TZl {Tmf) {Sm{9, s)). 
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We remark that for fixed 9, the connection (|^ between the Radon data in the 
different domains has an affine dependence on s. Since the filter operator W acts as a 
convolution operator with regards to s, its action will in principle be the same regardless 
if the coordinate transformation Tm is used or not. We use the notation Wip to denote 
the action of the filter operator in log-polar coordinates. 

The adjoint operator (back-projection) associated with the Radon transform 0 can 
be written as 

/ oo r 

/ g{9, s)5{x ■ 6 — s) dOds, 

-OO J 

cf. [25]. It is a weighted integral of g over lines passing through the point x, and just as 
for the Radon transform it can be expressed as a convolution in log-polar coordinates. 
We define 

/ OO /*7r 

/ p)^ cos(6» - 9') - l] d9'dp 

-OO J —77 ^ ^ 

/ OO r77 

/ 5 ( 6 '', pX* {p - p\9- 9') d9'dp. 

-OO J —77 


It is shown in 0 that TlX f{x) = 27^j^/(0, p), where the factor 2 comes from the fact 
that the corresponding integration in the polar representation {9, s) is only done in the 
half-plane s > 0. The log-polar back-projection operator has the same problem as the 
log-polar Radon transform in dealing with s = 0, and in a similar fashion we make 
use of partial back-projections in order to avoid this problem. Because of the relation 
Q and the fact that the filter operator W can be applied to each partial Radon data 
individually, it will be enough to consider partial back-proiections for Radon data with 


9 G 


-h m/3, 


-|- m/3 


according to the setup of Figure 


By applying to each 


of the partial back-projection, and summing up the results, we will recover the original 
function. For detailed calculations, we refer to [3|. 

The idea is thus to split the Radon data into M parts, where each part is transformed 
according to Figure]^). For each part, the filtered data is back-projected according to 


r f\{9',p')C*{p-p',9-0') d9'dp'. 

J— OO J 

Since we assumed that our original function had support inside the unit circle, we are only 
be interested in the contributions inside the disc D. Since only lines with p G [log(ar.), 0] 
go through this circle, the integration in the p variable above can be limited to p G 
[log(ar),0]. Similarly as for the Radon transform, we now want to write this (finite) 
convolution as a periodic convolution. Figure |^) illustrates how this can be achieved. 
The black solid lines show translations of the curves p = — log(cos(0)) representing 
the back-projection integral in the log-polar coordinates. The black dots now show the 
perimeter of a support of the Radon data, indicated by dark gray in the left illustration of 
Figure]^ The dark gray curves the back-projection illustration now show the translations 
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of /9 = — log(cos(0)) that will give a contribution inside the disc D. The light gray 
curves illustrate contributions that fall outside the support of D. The red curves show 
contributions that will fall outside the range [—/3,/3] x [log(ar.),0], and the blue curves 
show the effect when these lines are wrapped back in to the domain [—/?, /3] x [log(ar), 0]. 
We note that the blue curves do not intersect the circle. Hence, we define the partial 
log-polar back-projection operator as 

/■o /■f 

p)= I 9(0', p')C*, {p -p',e- 0') dO'dp'. (8) 

./log(ar) j-f 

where is the periodic extension of defined on [log(ar),0] x [—/?,/3], and note that 
for (9, p) corresponding to points inside the domain D, it holds that 

nf/g{9,p) = n*g{e,p). 

We then have that 

M-l 

2 ^ = fix) 

m,=0 

for all X in the unit disc. 


3 Fast evaluation of the log-polar Radon transform and 
the log-polar back-projection 

Let hhe a continuous function on some rectangle i? in and let phe a finite measure 
on R, and let pper be its periodic extension. Along the same lines as jlH Theorem 11.6-3], 
it is easy to see that the Fourier coefficients of their periodic convolution satisfies 

h * Ppev = hp, 

where hp is the pointwise multiplication of the respective Fourier coefficients with respect 
to R. We will use this formula and FFT to fast evaluate TZ\^ © and We use 

the notations 


fi0.p)= 

hg^hp 


kp^ 


27ri( fP 

' 2/3 - log ar 


,kpe 




a{f>, p) = sti.k 

kg,kp 

/-(a \ -) 

Ci0,p) = Cke,kpe V2/9 

kg,kp 

C*{0,p) = ^kg,kpe"'''^‘^^^y 


(9) 

( 10 ) 


kg,kp 
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The Fourier coefficients for the two distributions C and are given by 
rP /-o 


Cke,kp — 




/-/3 Jlog{ar) 

r0 rO 


Q -GkQ ^ pkp 

6{cos{9) — e^)e~ e~ dpdO = 


(5(cos(0) — e^)e {e^) -i°sM dpdO = 




l-d Jlog{ar) 


e {cos{6)) "^dO, 

-P 

rP /■-log(fflr) 


-•»%, -27ri /'i '-1 


5(e^cos(0) — l)e e -i°s{o,r) dpdO = 


5{e^ cos{6) — l)e 2/9 (e^) dpd6 = 


1-0 Jo 

rP r-log(fflr) 

l-p Jo 

/ P . 8kn n . kp 

e~ {cos{9)) ^^^°sMd6. 


®''e , „--27ri- 


pfcp 


( 11 ) 


( 12 ) 


Both the integrals on the right hand sides of © and ( |12[ ) are of the form 

rP 
l-P 


. 

P{p,a,l3)= / cos{6)°‘d9, 

J-0 


where p = —27r||, a = — 21 : 1 — 


log(ar) 


— 1 for © and a = — 27ri 


(13) 


log(ar) 


for (12), respec¬ 


tively. It turns out that there is a closed form expression for the integral (13), namely 


P{p.,a,(3) = 


r(^)r(i)r(^) 
r(^ + i)r(^ + i) 




(14) 


3fTi, “ +'^ + 1 , “ -+1; , 2 +2; COS(/3)d - 

ia + l){a + 2) ^ ^ V ’ 2 2 ’ 2 2 ’ 2 ’ 2 ’ J 

2cMPr» cos(^rt sin(/i) J, /j g ^ P ^ 1 g _ P ^ ^ ^ g ^ 

( a - hi ) •^^\^’2 2 ’2 2 ’ 2 ’2 ’ 

In the appendix we derive this expression. However, as it includes gamma and hyperge¬ 
ometric functions with complex arguments, special care has to be taken when evaluating 


(14) numerically, as cancelation effect easily can cause large numerical errors. For the 


case where there are not suitable numerical routines readily available for evaluation of 


these special functions, we briefly describe an alternative way to evaluate (13) numerl 


cally. Note that for p = —27r^, the integral is well suited for evaluation by FFT, since 


given range by evaluating the integral of (13) by the trapezoidal rule by means of FFT 


ff 

for a fixed value of kp (fixed a) we can obtain Cke,kp and for all integers kg in a 


However, for this procedure to be accurate, we need to oversample the integral, and make 
use of end-point corrections. The function cos(0)“ will be oscillatory, but the oscillation 
is determined by the fixed parameter a. Neglecting boundary effects, we can therefore 
expect the trapezoidal rule to be efficient, provided that sufficient oversampling is used. 
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For the boundary effects, we can make use of end-point correction schemes [2]. For 
the computations used in this paper, we have used an oversampling factor of 8 and an 
8 -order end-point correction with weights 


1 + 


1 

120960 


23681, 55688, -66109, 57024, -31523, 9976, -1375 


Suppose next that we only know values of / and 5 in ([^ and Q, respectively, on 
an equally spaced sampling covering [log(ar), 0] x [—/3, /3], i.e., / and g are known on the 
lattice 


jd jp 

2^e' -^og{ar)Np 


2 


< je < 


2 ’ 


0 <jp< Np, 


(15) 


with Nq = 2 


Ns 


2M 


We denote these values by fjg,jp and Qjgjp, respectively. In order 
3 e meaningful, we need to have continuous representations of / and g. 


for and ([^ to 

This is particularly important as C are distributions. A natural way to do this, 

is to define fkg,kp and dkg^kp by the discrete Fourier transform of fjgjp and gjgjp, i.e., let 


fkg,kp — ^ 


^-1 Np-l 


je=-^ JP=0 

lo 


Np Np 

if -^<ke<^, 0<kp<Np, 


otherwise. 


(16) 


and 


9kg,kp — ^ 


^-1 Np-l 

^2 9 jo. 






jg = -^ jp=0 

lo 


Np Np 

if -^<kg<^, 0<kp<Np, 


otherwise. 


(17) 

We can then use and ( |10[) to define continuous representations of / and g. Using 
this approach, the values ofQ and ^ are also well defined, namely 

• f ^^0 pkip 


kg ,/Cp 


and 


,_. . / dka pkp N 

<5(0, P) = E 9kg,kpC#kg,kpe 


kg,kp 


In the case where the two transforms are to be evaluated at {9,p) on the lattice (15), 
the corresponding sums above can be rapidly evaluated by used FFT. 


4 Sampling rates 

There are three different cases for which it is natural to use equally spaced discretization: 
For the representation of / in Cartesian coordinates {xi,X 2 ) covering the unit circle 
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for the polar representation {6,s) of the sinograms TZif)] and for the log-polar 
coordinates {9,p) for evaluation of TZ\p and We will use a rectangular grid in 

all three coordinate systems and interpolate data between them as we work with the 
different domains. 

In this section we derive guidelines for how to choose discretization parameters. 
These will be based on the assumption that / is “essentially supported” in a disc with 
radius N/2, in the sense that contributions from the complement of this disc can be 
ignored without affecting accuracy of our calculations. Due to the uncertainty principle, 
/ can not have its support included in this disc, since / itself is supported in a disc with 
radius 1/2, but it may work quite well in practice and is therefore still convenient to use 
for deriving sampling rates.^ We also base our arguments on rehnements of the Nyquist 
sampling rate, more precisely the Paley-Wiener-Levinson theorem. 

In the spatial variables {xi,X 2 )-, the Nyquist sampling rate (corresponding to the 
assumption on the support of /) is 1/N. This leads us to cover the unit circle with a 
grid of size N x N. We use 


X = 


n n 

N’ N 


N . . N 

-y < Jl,j2 < y- 


(18) 


To derive recommendations for the sampling in the polar coordinates {6, s) (for the 
entire TZf), recall the Fourier slice theorem, which states that 


f*00 roo 

/ f{s6 + t6^)e~^'^^^^dsdt = 

J —OO 


nf{s,e)e 


— 27Tisa 


ds = 


’ —OO 

/ OO 1*00 poo poo 

/ = / / f{x)e-^^^^^^>^dsdt = fia9). 

■OO J —OO J —OO J —OO 


• —OO J —OO 

COO COO 


Thus lZf{9,s) = J^~}^g{f{a9)), and hence the Nyquist sampling rate for s is As = 
yielding that Ng = N. Let A9p denote the angular sampling rate in the polar coordinate 
system. Since / is supported in the unit disc, the Nyquist sampling rate in the frequency 
domain ^ is 1 (on a rectangular infinite grid). However, the multidimensional version 
of the Paley-Wiener-Levinson theorem roughly says that it is sufficient to consider an 
irregular set of sample points whose maximal internal distance between neighbors is 1. 
Since we have assumed that the values of f{^) for |,f| > y are negligible, this leads us 
to choose A9p and Act (the latter will not be used) so that the polar grid-points inside 
this circle has a maximum distance of 1. It follows that 


A9p = 


2 

N' 


^More strict results can be obtained by introducing notions such as numerical support. Recall that 
any compactly supported A function has compact numerical support in both x and ^ by the Riemann- 
Lebesgue lemma. For instance, Gaussians have (small) compact numerical support in both the spatial 
and the frequency domain. The presentation will quickly become substantially more technical and we 
therefore follow the signal processing practice and use Nyquist sample rates despite not having infinitely 
long samples. 
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Figure 5: (a) Log-polar grid. Samples in the p variable are chosen in order not to lose the 
accuracy of measuring data, (b) The bandwidth of a partial back-projection function. 


Since we want to cover an angle span of [0, vr], this leads to Nq k, ^N. We denote the 
polar grid by 

N N 

S = {jeA^p, jsAs} , 0 < je < Ne, -y < js < —■ (19) 

Practical tomographic measurements can therefore typically have a ratio ^ ~ 1.5 
between the sampling rates in the 6 - and s-variables. We refer to [2Ul for more 
details, and proceed to discuss the sampling in the log-polar coordinates. 

To distinguish between the coordinate sampling parameters we use A0ip for angular 
sampling rate in the log-polar coordinates. Recall that we wish to accurately represent 
functions of the form T^f, which is a rotation, dilation (with factor 2 aji) and translation 
of /. The support of Tmf lies in the grey circle of Figure and its essential frequency 
support is then inside the disc of radius 

Figure illustrates the setup where the black lines indicate equally spaced samples 
of Oip] the blue curves indicate equally spaced sampling in sip; and where the red dashed 
curves indicate equally spaced sampling in p. Note that the maximum distance between 
points in the xi direction occurs when s = 1 or p = 0. It is thus clear that Asip should 
equal Axi = whereas Ap is determined by 

max eP — max e^fl —< Axi. 

pe[log(ar),0] pe[log(ar),0] V / 

As the largest distance occurs when p = 0 it follows that 

Ap<-log(l-^), 
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and consequently, since the total distance that is to be covered is — log(ar), that 


Np> 


log(«r) 

log(l - 


( 20 ) 


where the notation [x] denotes the nearest integer greater than or equal to x. 

For the determination of sample rate in the angular variable for the representation 
of Tmf in the disc D, we have that 

A^ip sin(A6»ip) < 

Hence, we introduce 


^ip — {jo1 jp^p} ) 




2A01P 


< 30 < 




2A6»i 


ip 


< jp ^ 0) 


( 21 ) 


for the representation of T^f in D. The representation of the partial Radon transform 
of Tmf needs a reduced sample rate compared to (21). As the Radon transform is 


applied as a convolution on a log-polar grid (i.e. by a Fourier multiplier), the higher 
frequencies in the 0-direction will not be needed. Hence, we can apply a low-pass filter 
in the 0-direction, and apply the FFT operation the grid 


— {jo^^^pi jp^p} ) 


Ne 

< jo < 

Ne 

[2M_ 


[_2 m\ 


Np < jp < 0, 


( 22 ) 


for the computation of the partial Radon tranform of Tmf- The grid Hp will also be 
useful when resampling Radon data in log-polar coordinates. 


5 Interpolation 


As mentioned in the previous section it will be natural to use equally spaced sampling 
in the Cartesian, polar and log-polar coordinate systems, respectively. We typically 
want to reconstruct data on an equally spaced Cartesian grid; the tomographic data is 
sampled in equally spaced polar coordinates; and both the Radon transform and the 
back-projection can be rapidly evaluated by FFT when sampled on an equally spaced 
log-polar grid. There are several ways to interpolate between these coordinate systems. 
In this work we will make use of cubic (cardinal) B-spline interpolation. We will discuss 
how to incorporate some of the interpolation steps in the FFT operations, and also 
discuss how the B-spline interpolation can be efficiently implemented on GPUs. 

The cubic cardinal B-spline is defined as 


B{x) 


'(3|x| 3 - 6|xp-g4)/6, 0<|x|<l, 

< (—|x|^-|-6|xp — 12x-|-8)/6, 1 < |x| < 2, 

,0, |x| > 2. 
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This function is designed so that it is non-negative, piecewise smooth, and at |a: 
0,1, 2. Note that at integer points, it holds that 


B{j) 


1 ifj=0, 

^ g if \j\ = 

0 otherwise. 

\ 


(23) 


When used as a filter, it will smooth out information and is thus acting as a low-pass 
filter. Consequently, it can not be used directly for interpolation. 

The Fourier series given by the coefficients in (23), which we denote hy B, is given 

B{^) = y ^ + 4 + ^ ^ cos(27rO. (24) 

6 V / 3 3 


by 


Suppose that equally spaced samples fk of a function / (in one variable) are available. 
We want to recover values of / at arbitrary points x using 


fix) = Y,iQf)kB 


k 

^~N 


(25) 


Since B has short support, only values of {Qf)k for k ~ xN will contribute in this 
sum. The operator Q is a pre-filter operation, that is compensating for the fact that 
convolution with B suppresses high frequencies. The pre-filter operation is boosting 
high frequencies in the samples fk- It can be computed in different ways. Perhaps the 
most direct way is to define Q in the Fourier domain (by the discrete Fourier transform), 
where it essentially becomes division by B (upon scaling and sampling). In this case, it 
is easy to see that the convolution with B and the pre-filter operation will cancel each 
other at points x = i.e., the original function is recovered at the sample points; which 

is a requisite for any interpolation scheme. 

As we will compute the Radon transform and the back-projection by means of FFT 
in log-polar coordinates, we can in some steps incorporate the pre-filter step 1/B in 
the Fourier domain, at virtually no additional cost. However, not all of the pre-filter 
operations can be incorporated in this way. While the pre-filter easily can be applied 
by separate FFT operations, we want to limit the total number of FFT operations, as 
these will be the most time-consuming part in the implementations we propose. 

As an alternative to applying the pre-filter in the Fourier domain, it can be applied 
by recursive filters. In |33j these operations are derived by using the Z-transform. 

It turns out that if we define 


(QHk = 6/(fe) + (v/3- 2)(Q/+)fc_i, 


then 

iQf)k = (V3 - 2)iiQf)k+i - (Q/+)fc), (26) 

cf. [331 equations (12,13)], where the boundary condition on Qf and Qf~^ are given in 
|33l equations (14,15)]. These filters can be efficiently implemented on GPUs. 
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Two-dimensional prefiltration can be done in two steps, one in each dimension. We 
will use the notation Qf for the pre-filtering also in this case. On a GPU this implies 
doing operations on rows and columns separately. However, there are highly optimized 
routines for transposing data, which means that the prefiltration can be made to act 
only on column data. This is done to improve the so-called memory coalescing and 
GPU cache performance m- Memory coalescing refers to combining multiple memory 
accesses into a single transaction. In this way the GPU threads run simultaneously and 
substantially increased cache hit ratios are obtained. 

Let us now turn our focus to the convolution step in (25). Let 


k = [iVxJ, 

a = X — [xj. 


The sum (25) then reduces to 


f{x) = wo{a){Qf)k-i + wi{a){Qf)k + W 2 {a){Qf)k+i + W 3 {a){Qf)k+ 2 , where (27) 
wo{a) = B {a — 2), wi{a) = B{a — 1), W 2 {a) = B{a), 1 x 3 ( 0 ) = B {a + 1). 


We now discuss how this sum can be evaluated using linear interpolators. As mentioned 
previously, linear interpolation is executed fast on GPUs. In 137! it is shown how this can 
be utilized to conduct efficient cubic interpolation. The cubic interpolation is expressed 
as two weighted linear interpolations, instead of four weighted nearest neighbor look-ups, 
yielding 2'^ operations instead of 4'^ for conducting cubic interpolation in d dimensions. 

We briefly recapitulate the approach taken in [32l[37j. Given coefficients (Qf)k, let 
Q/lin be the linear interpolator 


Qfiinix) = (1 - {a)){Qf)k + a(Qf)k+i 

= (l-(x- [x\))(Qf)iN^i + (x- [x\)(Qf)iN^i+i. 


The sum (27) can be then be written as 

f(x) = (woia) + wi{a))Qfun (^k - 1 + 
+ iw 2 ia) -h W3{a))Qfnn {k + l + 


wi(a) 


wo{a) + wi(a) 

mja) 

W2(a) + W3(a) 


(28) 


The evaluation of the function Qf\\n{x) can be performed by hard-wired linear interpo¬ 
lation on GPU. In modern GPU architecture the so-called texture memory (cached on a 
chip) provide effective bandwidth by reducing memory requests to the off-chip DRAM. 
The two most useful features of these kind of memory with regards to conducting B- 
spline interpolation, are 


1. The texture cache is optimized for the 2D spatial locality, giving best performance 
to GPU threads that read texture addresses that are close together. 
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2. Linear interpolation of neighboring values can be performed directly in the GPUs 
texture hardware, meaning that the cost for computing the interpolation is the 
same as reading data directly from memory. 


This implies that the cost for memory access in (28) will be two instead of four as only 
two function calls of Qfun are made in (28), and in two dimensions the corresponding 
reduction from 16 memory access operations to 4 will give significant improvement in 
computational speed. 


6 Algorithms 


We now have the necessary ingredients to present detailed descriptions on how to rapidly 
evaluate the Radon transform and the back-projection operator by FFT in log-polar co¬ 
ordinates. In the algorithms below, we let denote the values of the two-dimensional 

counterpart of (24), scaled to represent the sampling on {0,p) G 


2 ’ 2 


X [log Or, 0]. 


Algorithm 1 Fast Radon transform 


Given / sampled at A (18) compute Qf by (26) 
for m = 0,... M — 1 do 


Resample Tm/ at flip (21) by (25) 

Downsample from Dip (21) to Dp (22) 

Multiply result by 

Apply the log-polar Radon transform with pre-filtering incorporated, i.e., com¬ 
pute fkg,kp from (16), and evaluate 


27ri 

Jke,kp- - e 


PPP-Li k ^ 

2Ng Np 


kg,kp 




kg,kp 


by using FFT. 

7: Resample from S~^Dp (22) to S (19) by (25). 

8 : end for 


Let us end with a few remarks on time complexity. The most time consuming part of 
both algorithms are the convolutions that are implemented by FFT. In total, 2M FFT 
operations needs to be computed (including forward and backward FFT’s), and each 
operation will be done on a grid of size x Np. For M = 3, it follows from (20) that 


Np^- 


N log{ar 

2a_R 


2.1iV, 


and Np is monotonically decreasing for increasing M. It thus hold approximately that 
twice as many samples in p compared to that originally used for the sampling of s 
{Ng = A), and in the angular variable we also need to sample with twice as many 
parameters in order to avoid aliasing effects. In total we need to use an oversampling of 
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Algorithm 2 Fast back-projection 


Given g sampled at S (19) compute Qg by (26) 
for m = 0,... M — 1 do 

Resample g{Sm) at Up (22) by (25) 

Apply the log-polar back-projection with pre-filtering incorporated, i.e., compute 
9ke,kp from and evaluate 


^ ^^kg,kp 2-Ki 

/ > 9kg,kp^ e 

kg,kp kg,kp 


-^+jpkp 


by using FFT. 

5: Resample from (22) to A (18) by (25) to obtain partially back-projected 

data. 

6: end for 

7: Sum up the M partial back-projections. 


about 4 in the FFT operations. Note that this is the same oversampling that is generally 
needed for computing the convolution of two functions if aliasing is to be avoided. The 
total cost for applying the Radon transform and the back-projection operator is thus 
the same as would be expected for a generic convolution. 

In addition to the FFT operations, interpolation and one-dimensional filter oper¬ 
ations are needed in the implementations. In our simulations these operations will 
typically take about 25% of the total time. 

7 Performance and accuracy tests 

Let us start by briefly discussing the filters used in the filter back-projection ([^. This 
discussion is included in order to better interpret the errors obtained when comparing 
different methods. For theoretically perfect reconstruction (with infinitely dense sam¬ 
pling) the filter W in ([^ is given by 

iBrampCo-) = |a|. (29) 

This filter is sometimes referred to as the ramp filter. If the sampling rate is insufficient 
in relation to the frequency content of / (or the object upon which the measurements is 
conducted on), it can be desirable to suppress the highest frequencies in order to localize 
the effects of the insufficient sampling. There is a relation between one-dimensional 
filtering in the s-direction of Radon data, and a two dimensional convolution in the 
spatial domain |25l p.l02]. It can be explained by the Fourier slice theorem, which 
describes how Radon data can be converted to a polar sampling in the frequency domain 
by taking one-dimensional Fourier transforms in the s-direction. The application of the 
one-dimensional filter will then correspond to a change in amplitude (and possible phase) 
along the lines (indicated by dots with the same angles) in Figure]^). As there is no 
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Figure 6: Filters for Radon transform inversion. 


information outside the circle with radius iV/2, the action of the ramp filter will be 
equivalent of applying a two-dimensional convolution to the original function / using a 
two-dimensional filter with Fourier transform 


hF];'amp(0 


1 

0 otherwise. 


(30) 


We see that if / contains more high frequent information than the one prescribed by the 
sampling rate N, then the sharp cutoff in (30) can yield artifacts, and in the presence 
of noise in the Radon sampling the high-frequency boosting of (29) will boost the noise. 
Replacing the ramp filter (29) with a filter that goes smoothly to zero at the highest 
(discrete) frequencies, will thus yield an image that is slightly smoother, but on the other 
hand an image with suppressed high-frequency noise and artifacts due to incomplete 
sampling. Sometimes, the ramp filter is modified so that it does not reach zero but only 
reduces the high-frequency amplitudes. Two common choices of filters are the cosine 
and the Shepp-Logan filters, defined by 


Wcos(<7) 


wshicr) 


|o-|cos(^) 

0 

|fT|sinc(f) 

0 


otherwise. 


otherwise. 


(31) 

(32) 


The cutoff is made above the sampling bandwidth, as it will illustrate the practical effect 
that the filters have on measured data. The three filters are illustrated in the frequency 
domain in Figure The two-dimensional filters associated with these filters have Fourier 
representations 


WcosiO 


cos(^) if|C|<f, 

0 otherwise. 
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and 


vKsL(e) = 


smc 


0 


§) 

otherwise. 


respectively. As the Wcos goes to zero at the highest sampling rate, we can expect 
smaller error when using this filter compared to the others. On the other hand, the 
highest frequencies are suppressed, and the results reconstructions will look slightly less 
sharp. 

To illustrate the accuracy of the suggested implementation, we conduct some exam¬ 
ples on the Shepp-Logan [33] phantom. We use the modified version introduced in [381 
Appendix B.2]. The function used (/) is illustrated in the left panel of Figure]^ The 
phantom consists of linear combinations of characteristic functions of ellipses, and its 
support is inscribed in the unit circle. Since the Radon transform is a linear operator, 
TZf is a linear combination of Radon transforms of characteristic functions of ellipses. 
Since the Radon transform of the characteristic function of a circle can be computed 
analytically, analytic expressions are available for the Radon transform of the Shepp- 
Logan phantom by applying transform properties of shifting, scaling and rotation. This 
Radon transform is depicted in the right panel of figure 

The presence of a high-frequency discontinuity caused by the filter can cause artifacts, 
but also the discontinuity of the derivative at cr = 0. To avoid artifacts from this part, 
we apply end-point trapezoidal corrections. The effect of this correction can be seen 
around cr = 0 in Figure Our aim is to eliminate as much of the errors as possible to 
distinguish the errors that the resampling between the different coordinate systems used 
in the proposed methods introduce. 

In Figure we show some reconstruction results from the filtered back-projection 
using different methods and different filters. For the sake of quality comparison, we 
use the results from ASTRA Tomography Toolbox [28] . NiftyRec Tomography Toolbox 
|3T] and the MATLAB image processing toolbox"'"'^. The ASTRA Toolbox uses GPU 
implementations and the implementation is described in [271 mi. The NiftyRec Toolbox 
comes with both GPU and CPU implementations. The toolbox is described in [301 [29] . 

For the comparisons we have used ASTRA vl.5, NiftyRec v2.0.1, and MATLAB 
v2014b. Both ASTRA and NiftyRec provide MATLAB scripts demonstrating how to 
use call their routines. For the MATLAB tests, we have used the functions radon 
and iradon. Both these functions call compiled routines. To test the back-projection 
algorithm from Astra toolbox we based our test on the provided routine s014_FBP. The 
iterative tests in the next section are based on the script s007_3d_reconstruction. 

The comparison against the NiftyRec toolbox was done by using the back-projection 
part of the provided demo tt_demo_mleni_parallel for parallel beam transmission to¬ 
mography. The package provides the option to use either GPU or CPU routines. We 
provide timings for both cases. We use the setup of tt_demo_mlem_parallel for the 
iterative tests described in the next section. 

The reconstructions in the left column of Figurej^use the ramp filter (29); the recon¬ 
structions in the middle column use the Shepp-Logan filter (32); and the reconstructions 
in the right column use the cosine filter (|31|). The reconstructions were made on a 
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MATLAB NiftyRec ASTRA Log-polar 


Ramp 


Shepp-Logan 


Cosine 



Figure 7: Computational errors of filtered back-projection for the ramp; the Shepp- 
Logan; and the cosine filters for different methods. 
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Table 1: Computational time (in seconds) of the back-projection for sizes {Nq x Ng) = 
(|iV X N) (excluding initialization time). 


N 

Log-polar 

ASTRA 

NiftyRec (GPU) 

NiftyRec (GPU) 

MATLAB 

256 

1.6e-03 

l.Oe-02 

7.1e-02 

1.3e00 

3.1e-01 

512 

6.1e-03 

3.5e-02 

5.6e-01 

1.3e01 

2.4e00 

1024 

2.5e-02 

1 .8e-01 

4.4e00 

1 .2e02 

1.9e01 

2048 

9.9e-02 

1 .2e00 

3.5e-F01 

1.0e03 

1.5e02 


512 X 512 grid using 768 samples in the angular direction. For the log-polar reconstruc¬ 
tion, we used M = 3 partial reconstructions, and Np = 2N. As the cosine filter goes 
to zero at the boundary, the errors in the reconstruction the right column should better 
represent the errors caused by the sampling parameters. The reconstruct panels have 
inscribed £^-errors against the filtered versions of the phantom. We can see that ASTRA 
and NiftyRec produce almost identical errors. We note that the proposed method seems 
to give the smallest error out of the four methods. The NiftyRec reconstructions refer 
to the GPU implementation. 

In Table we show times for computing of one back-projection by the different 
methods mentioned above. To exclude initialization effects and assure high GPU load, 
the times in Table are obtained by executing batches of reconstructions (to fill up the 
GPU memory) in one function call. 

We note that the proposed method is about 5-10 times faster than ASTRA, substan¬ 
tially faster than both the GPU and the GPU implementations of NiftyRec, and up to 
1500 times faster than the routines available in the MATLAB toolbox. 

For the tests, we have use a standard desktop computer with a Intel Gore 17-3820 
processor and a NVIDIA GeForce GTX 770 graphic card. We have used NVIDIA GUDA 
cuFFT library for the FFT operations, and we have used version 7 of the GUDA Toolkit 
|12j . The computations were performed in single precision. The written GPU program is 
optimized by using NVidia Nsight profiler, and GPU memory usage, kernels occupancy, 
instructions fetch and other performance counters were analyzed and improved by using 
strategies described in [251 IMj ■ 

8 Iterative methods for tomographic reconstruction 

In some situations it is preferable to use an iterative method for doing reconstruction 
from tomographic data. This could for instance be because of missing data, e.g., that 
data for some angles are missing; suppression of artifacts [23]; or that additional infor¬ 
mation about the noise contamination can be used to improve the reconstruction results 
compared to direction filtered back-projections Q. Iterative reconstruction methods 
rely on applying the forward and back-projection operators several times. Iterative 
algorithms can be computationally expensive when a large number of iterations are re¬ 
quired for the algorithm to converge. For that reason, fast algorithms for computing the 
Radon transform and the associated back-projection can play an important role. 
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Iterative algebraic reconstruction techniques (ART) are popular tools for reconstruc¬ 
tion from incomplete measurements [E]. It aims at solving the set of linear equations 
determined by the projection data. Transmission based tomographic measurements mea¬ 
sure the absorption of a media along a line. This puts a sign constraint on the function 
we wish to recover. It is clearly not ideal to assume the data is contaminated by normally 
distributed noise (for which the best estimate is given by the least squares estimates). 
A more reasonable assumption is that the added noise has a Poisson distribution |32] • 
The simplest iterative method for solving the estimation/reconstruction problem un¬ 
der this noise assumption is by the Expectation-Maximization (EM) algorithm. The 
EM-algorithm is well-suited to reconstruct tomography data with non-Gaussian noise 
[IIlEslIM!. Alternative techniques include for instance the Row Action Maximum Like¬ 
lihood Algorithm (RAMLA). 

More details about the EM-algorithm can be found for instance in [9l ll2]. In our 
notation, it can (given tomographic data g) be expressed as the iterative computations 
of 



where the function xc(^) s) is one if the line parameterized by {6, s) passes through the 
unit disc (the support of /) and zero otherwise. In each step, a Radon transform 7^/^ 
and a back-projection ( 7 ^) computed. 

It is well-known that a crucial part of GPU computations is host-device memory 
transfers. Eor an iterative method such as the one described above, it is possible to keep 
all the necessary data in the GPU memory, and thereby limit the data copying between 
host and device memory to an initial guess /^, the measured data g, and the hnal result. 
As most methods, the proposed method require some initialization steps (e.g., geometry 
parameters and convolution kernels C^) 

The obtained GPU program was tested on Radon data with Poisson noise, cf. Eig- 
ure[^). Eigure [^) shows the result of applying the filtered back-projection formula, 
whereas figure demonstrates de-noised recovered data after 50 iterations of the EM- 
algorithm. In Table we present performance results for the EM-algorithm for the 
proposed method and the methods mentioned in the previous section. The computa¬ 
tional times for conducting 100 iterations of the EM-algorithm for different resolution 
parameters are presented. The Radon data was sampled with parameter Nq = |A^, 
and Ns = N. We used M = 3 partial back-projection for the proposed log-polar based 
algorithms. The speedup from the previous section is conhrmed also for this case. 

9 Conclusions 

We have described how to implement the Radon transform and the back-projection as 
convolution operators in log-polar coordinates efficiently on GPUs. We present sampling 
conditions; provide formulas and numerical guidelines for how to compute the kernels 
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Figure 8: Reconstruction of Radon data with noise. Radon data with Poisson noise is 
shown in the left panel, the result of applying filtered back-projection is shown in the 
middle panel, the right panel shows the result after 50 iterations of the EM-algorithm. 


Table 2: Time (in seconds) for 100 iterations of the EM-algorithm for reconstruction 3d 
data of sizes {N x N x N). 


N 

Log-polar 

ASTRA 

NiftyRec (GPU) 

MATLAB 

256 

512 

1024 

2048 

8 .8e-b01 

6.9e-|-02 

5.4e-b03 

4.2e-b04 

3.1e-d02 

2.3e-b03 

2.5e-d04 

3.5e-d05 

3.6e-|-03 

5.8e-d04 

9.3e-d05 

1.6e-|-04 
2.6e-d05 (*) 
4.2e-d06 (*) 

* - estimated 

1 by using a reduced number of slices. 


associated with the Radon transform and the back-projection operator; and discuss how 
the convolutions can be rapidly evaluated by using EET. The procedure involves sev¬ 
eral steps of interpolation between data in the Radon domain; the spatial domain; and 
the log-polar domain. R is comparatively favorable to conduct interpolation in these 
domains compared to conducting interpolation in the Eourier domain, as the functions 
will tend to be less oscillatory there. We use cubic spline interpolation which can be effi¬ 
ciently implemented on GPUs, and optimized routines for PET on the GPU. We conduct 
numerical tests and see that that we obtain at least as accurate results as that produced 
by other software packages, but obtained at a substantially lower computational cost. 
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Appendix 


In this appendix, we present the exact result of the integration in (13). By applying two 
partial integrations, it follows that 


- a, p) = 

f d? 

J-P 


f 

dip 


dip 


cos(v9)“d(^ = cos(v9)“|P^ 

Aew) Y-. - -(^)") l!, + £ ew (£ d, = 


COs((/3)" ) dip = 


cos((/?)" ^ {ipL cos{ip) + a sin 
e*^‘^cos((/?)"“^(i/rcos((/?) + a sin 
e*^‘^cos(v9)"“^(i/rcos(v9) + a sin 


/3 rP 


/ 




+ / { :7T2 ) dip = 


B fd . / 

_l_ / a(a — 1) cos(v9)"“^ — cos((/9)" jdy? 


/? 


+ a{a — l)P{pL, a — 2,(3) — a^P{pL, a, (3). 


Hence, we obtain the following recursive relation for P 

, 2 ' 


P{n,a- 2,(3) = 


a 


a — 1 


1 - ^ ) P{n, a, (3) + h{p., a, (3), 


where 


h{n,a,(3) = 


(/xcos(/5)“ sin(/i/3) — acos“ ^(/3) cos(/i/3) sin(/3)). (33) 


a{a — 1) 

For all positive n it thus holds that 


P{n,a- 2,(3) = Y[ 

/c=0 

n j-1 


En 

^■=0 k=0 


Oi “h 2 /l 

a + 2k — 1 


a + 2k 
a + 2k — 1 


n a 


no 


k=0 
n 


fe =0 




(a + 2k)'^ 


r(f + n) ^ ^ ■■2 


r(^ + n) r(f) 


n (1 


fc =0 

' “ 

. 2 


Ai 


(a + 2/c)2 


{a + 2A:)2 

+ 2j,/3) = 

P{pL, a + 2n, /?)+ 


P(/i, a + 2n,/?)+ 


fc=o 


^ r(^) r(f+j-i) ^-^ 

,t;r(!ti+i-i) r(|) 

^ (f ~ 5’ i) Y\ ((A - 

n R/'Q: 1 1\ i~l / 2 

\ ^ ^V2 2’ 2 I TT f n ^ 

^ 2 2> 2 I A:=0 ^ 


UP 




(a + 2fc)2 


h{n, a + 2j, (3) = 


P{p,a + 2n,(3)+ 


{a + 2 k)‘^ 


h{fj.,a + 2j,/3). 
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where B is the beta function, defined by 

^7r/2 

B{m,n) = 2 / {cos {sin 


( 34 ) 


Above we have used some of the properties of the gamma function [U p. 256] and its 
relation to the beta function B [U p. 258], 

r(m)r(n) 


B{m, n) = 


r(m + n) 


In the limit case n —)• oo, it holds that 


P(/i, a — 2, f3) = lim 

n^oo 


n 




B{n + q- 2,2) Ln 


n to fa 11', i-1 

^ 2 2’ 2 I TT j 1 

+ « _i ii li r 

j=0 w ^ 2 2 ’ 2 / r-i 


lim 

n—>00 


2’ 2^ k=0 
n 


2 2^ 2! k=0 

,2 




{a + 2kY 


P(/r, a + 2n, /3)+ 




{a + 2kY 


+ 2j,/3) 




2 2 


fc =0 


\a+2n 


{a + 2kr)jy B{n + ^-llf^^ 


n to fa 11', i-1 

^ 2 2’ 2 I TT ( 1 

_ 1 1 ) li I 

=n ^ 2 2’ 2'' k=0 


j=0 




(a + 2kY 


/i(/i,a + 2j,/3) 


(35) 


Using (34), it is easy to see that the sequence of functions 

(cost)"- 

^ K/n _ 1 li 
2 2 ’ 2 I 

tends to 5{t) when n —>■ 00 . Using this fact and ( [3^ we can represent P{^,a,[3) in the 
form P(/U, a, /3) = Po{b, a, /3) + Pi(/U, a, /3), where 

Po(p«,/3)=pf^+^,j)nfi 


Pi{n,a,l3) = ^ 


i3(f+ i,i) 


k=l ^ 

1 l^ i-1 


{a + 2k)'^ 


fpnb 




^P(j + f + i,i) (« + 2fc)^ 

From the identity [U p. 336] 

r2(n + 1) 


/i(/i,a + 2j,/3). 


r(n + xi + l)r(n — xi + 1) 


k=l 


n 1 + 


(n + k) 


2 / ’ 


(36) 


it follows that 
Po(/U,a,/3) = B 

= B 


a + 1 1 
2 ’ 2 


a + 1 1 


nb 


fc=i 




(a + 2kY 

r2(| + i) 


r(^)r(i)r(^) 


2 ’2; r(f+ f+ i)r(f-f+ 1) r('^±^ + i)r(^ + i)' 
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Concerning the second term, we have that 


Pi(/i, a,/3)= lim 

n^oo 


'-ri B{j + 2±i. i 


E 


-h{n, a + 2 + 2j) 
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where h{pL,a) = (/i cos(/3)“ sin(^/3) — acos“ ^(/3) cos(///3) sin(/3)). By using the 

identity ph]) again, we can rewrite 
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_e!_i 

(a + 2 ky J 


r^(f + i)r(f + f + j + i)r(f _ ^ ^ +1) 
r(f + f + i)r(f - f + i)r2(f + j +1) 


An expression for Pi(/i, a,/3) thus reads. 


p,(« a fl) = f Mm a + 2 + ) r(¥)r(j + i)r(j + j+i + i)r(j-;+j + i) 

oo 
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(f + 2 + ^)j(§ ~ 2 + ^)i 

(^).(f +Pa¬ 


using the Pochhammer symbol [U p. 256] 


ix)j = =x{x + l)...{x + j-l). 


By using the definition of h (|33|), we split the sum into two parts, 

OO 
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Now we can use the facts that (1)^ = j! and that 

I , ^ ^ r(z + j + i) ir(2; + j + i) 1 
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to simplify the above expression as 
P\{n,a,f5) = 2/icos(/3)"’''^ sin(/r/3) 
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The sums above take the form of 3 T 2 hypergeometric functions 
it holds that 
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2^ cos(/3)"’''^ sin(^/3) 

[a + l)((a + 2 ) 

2 cos(/3)“’''^ cos{Pii) sin(/3) 

(a + 1 ) 

We then finally obtain that 
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